Skip to content

Added function for parameter validation#38

Merged
KevinMenden merged 19 commits intonf-core:dsl2from
KevinMenden:validate-params
May 4, 2021
Merged

Added function for parameter validation#38
KevinMenden merged 19 commits intonf-core:dsl2from
KevinMenden:validate-params

Conversation

@KevinMenden
Copy link
Copy Markdown
Contributor

PR checklist

  • This comment contains a description of changes (with reason)

This adds the Validation class in in the lib folder, which checks parameter for validity.

Currenty supports:

  • check that types between the schema and given parameters are matching
  • issues warnings for unexpected parameters
  • check that a valid choice is given for enum fields
  • check whether core nextflow flags are specifieds as in params

It does not currently exit the program yet, and needs some more testing, so I'M keeping this just as draft for now.

@KevinMenden
Copy link
Copy Markdown
Contributor Author

Nevermind the checks in the config - just did some testing, will change back.

Comment thread lib/Validation.groovy Outdated
Comment thread lib/Validation.groovy Outdated
@KevinMenden KevinMenden marked this pull request as ready for review December 16, 2020 07:44
Copy link
Copy Markdown
Member

@ewels ewels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, but you're starting to hit some of the JSON Schema complexities that are not obvious at first. The more I look through your code the more potential future problems I can see with various different schema structures 😓 I fought with stuff like the varying hierarchies of the required keyword and different structure of definitions and property obejcts for a long time over the summer.

If you can stomach it, the JSON Schema validator package should have dealt with all of this complexity for you already 😉

Comment thread lib/Validation.groovy Outdated
Comment thread nextflow.config Outdated
Comment thread lib/Validation.groovy Outdated
Comment thread lib/Validation.groovy Outdated
Comment thread lib/Validation.groovy Outdated
Comment thread lib/Validation.groovy Outdated
Comment thread lib/Validation.groovy Outdated
Comment thread lib/Validation.groovy Outdated
Comment thread lib/Validation.groovy Outdated
Comment thread lib/Validation.groovy Outdated
@ewels ewels mentioned this pull request Dec 16, 2020
2 tasks
Comment thread nextflow_schema.json Outdated
@KevinMenden
Copy link
Copy Markdown
Contributor Author

Okay, looks like there's a bit more work left to-do than I expected :-)

Will have another go at the schema validator today, see if I can get that running and if it fixes the issues. Then try to make the code more robust.

@KevinMenden
Copy link
Copy Markdown
Contributor Author

Okay it's now working and not failing for false-strings, memory objects and the like.
Will still have to do some cleaning and double-checking though ...

Copy link
Copy Markdown
Member

@ewels ewels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking awesome! Much happier with this implementation, great job 😊

I wonder if we should have options to skip this function completely, and to hide warnings about unrecognised parameters (maybe specific ones - a param to specify additional expected params).

I'm also a little nervous about the library grab thing in offline environments which is something I only just thought about 😕 @pditommaso - any suggestions about importing Java libraries like this in workflows?

Comment thread lib/Validation.groovy Outdated
Comment thread lib/Validation.groovy Outdated
Comment thread lib/Validation.groovy Outdated
Comment thread lib/Validation.groovy Outdated
Co-authored-by: Phil Ewels <phil.ewels@scilifelab.se>
@KevinMenden
Copy link
Copy Markdown
Contributor Author

Yes, the skip function should be easy to implement. And yes it's probably nicer to omit those warnings and only show in case something goes wrong!

I will also have to test this a bit more with more fancy pipelines before making a PR over at tools.
And I ended up copying the params anyway because that's easier to filter than a JSON object - but I'm not entirely trusting this copy yet (fear that it might be too shallow) - so I will have to do some more testing there as well.

@KevinMenden
Copy link
Copy Markdown
Contributor Author

Alright did some more testing and all the different schema types seem to validate correctly.
The only thing I might add as well is to check unrecognised parameters for similarity with correct params, as you suggested Phil.

Apart from that it should be fine - let me know if you can still find some bugs :-)

If we were to add this in the current version to tools, we would need to:

  • add the groovy function to lib
  • adjust the main.nf template to include the validation code
  • add the validate_schema param to nextflow.config by default
  • extend the nf-core download utility to get those java dependencies
  • add patterns to the memory and duration objects so that they can be properly validated

I was also wondering whether we should put the complete function in Schema.groovy? I made a separate script mainly because it was easier to test this way .

@ewels
Copy link
Copy Markdown
Member

ewels commented Jan 7, 2021

  • extend the nf-core download utility to get those java dependencies

How would this work?

@KevinMenden
Copy link
Copy Markdown
Contributor Author

It should be enough to just run this once:

grape install org.json json [20201115]
grape install org.everit.json org.everit.json.schema [1.5.1]

That installs all necessary dependencies.
Only thing I just realized now, and that might be a problem, is that grape is not installed if you just install nextflow, it comes with the complete groovy installation ... so that might be an issue, will have a closer look

@KevinMenden
Copy link
Copy Markdown
Contributor Author

There's also the NXF_GRAB nextflow env variable, which: "Provides extra runtime dependencies downloaded from a Maven repository service."
So it might also be possible to use that for downloading the dependencies somehow.

Well I will do some testing today. If the command line version doesn't work, then a dirty workaround would be to fire up a small NF script which get's the dependencies via @grab .
Really not quite sure how all of this groovy magic is working :D

@KevinMenden
Copy link
Copy Markdown
Contributor Author

KevinMenden commented Jan 8, 2021

Okay now I just saw your comments - continuing in the main thread so it's easier to follow.

Okay then that's a problem with downloading... also my last two comments won't help solving then I think. Not quite sure about the NXF_GRAB but probably that won't help either.

I'm pretty sure it pulls some other libraries unfortunately :/
Will try anyway. Way to go would then be to download the jars in case we need them and put them to lib/, I guess

Considering that this parameter checking is a nice, but not a required functionality, I'm wondering whether we absolutely need this for offline use? Could just add a warning that "parameter validation not available when running offline" or something like that. Just in case we need to add too much bulk in order to get it to work.

Alright I'll try to play around with the JARs now :)

@KevinMenden
Copy link
Copy Markdown
Contributor Author

As I feared json-schema needs other dependencies, and it fails because of that (otherwise imports seem to be fine).

Not sure if that can be resolved, just putting these dependencies into lib/ as well doesn't seem to work. Will keep looking for ways to solve this.

@KevinMenden
Copy link
Copy Markdown
Contributor Author

Okay turns out we really need those dependencies somehow because groovy fails even if we don't execute the code. It still checks for the libraries.

When specifying the dependencies with NXF_GRAB, nextflow dowloads them to .nextflow/capsule/deps and running the pipeline offline works. It's similar to specifying them with @Grab inside the groovy script, which saves it to .groovy/grapes

When NXF_GRAB is defined, executing any nextflow command will download the dependencies to .nextflow/capsule/deps.

Currently the only way I see to make this work is to download the dependencies in one of the two ways mentioned above when running nf-core download, shipping it with the downloaded archive, and then telling the user to move the dependencies to the .nextflow/capsules/deps directory.
That's not nice because it's extra work for the user but it's the only way I can see right now, considering that putting the JAR files to lib didn't work.

Think I will post this to slack in the hope for more ideas :)

@KevinMenden KevinMenden merged commit 5838709 into nf-core:dsl2 May 4, 2021
mashehu pushed a commit that referenced this pull request Nov 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants